Thesis for the Degree of Licentiate of Engineering Applications of Functional Programming in Processing Formal and Natural Languages
نویسنده
چکیده
English) This thesis describes two applications of functional programming to process formal and natural languages. The techniques described in this thesis are closely connected to compiler construction, which is obvious in the work on BNF Converter. The first part of the thesis describes the BNFC (the BNF Converter) application, a multi-lingual compiler tool. BNFC takes as its input a grammar written in Labelled BNF (LBNF) notation, and generates a compiler front-end (an abstract syntax, a lexer, and a parser). Furthermore, it generates a case skeleton usable as the starting point of back-end construction, a pretty printer, a test bench, and a LTEX document usable as a language specification. The program components can be generated in Haskell, Java, C and C++, and their standard parser and lexer tools. BNFC itself was written in Haskell. The methodology used for the generated front-end is based on Appel’s books on compiler construction. BNFC has been used as a teaching tool in compiler construction courses at Chalmers. It has also been applied to research-related programming language development, and in an industrial application producing a compiler for a telecommunications protocol description language. The second part of the thesis describes Functional Morphology, a toolkit for implementing natural language morphology in the functional language Haskell. The main idea behind is simple: instead of working with untyped regular expressions, which is the state of the art of morphology in computational linguistics, we use finite functions over hereditarily finite algebraic data types. The definitions of these data types and functions are the language-dependent part of the morphology. The language-independent part consists of an untyped dictionary format which is used for translation to other morphology formats and synthesis of word forms, and to generate a decorated trie, which is used for analysis. Functional Morphology builds on ideas introduced by Huet in his computational linguistics toolkit Zen, which he has used to implement the morphology of Sanskrit. The goal has been to make it easy for linguists who are not trained as functional programmers, to apply the ideas to new languages. As a proof of the productivity of the method, morphologies for Swedish, Italian, Russian, Spanish, and Latin have already been implemented. The four papers included in this thesis have been published previously as follows: • Paper I: Labelled BNF: A High-Level Formalism For Defining WellBehaved Programming Languages, Markus Forsberg & Aarne Ranta, Proceedings of the Estonian Academy of Sciences, Special issue on programming theory, NWPT’02, December 2003, pages 356–393 • Paper II (Technical Report): BNF Converter: Multilingual FrontEnd Generation from Labelled BNF Grammars, Michael Pellauer, Markus Forsberg & Aarne Ranta, Technical Report no. 2004-09 in Computing Science at Chalmers University of Technology and Göteborg University • Paper III: Tool Demonstration: BNF Converter, Markus Forsberg & Aarne Ranta, Proceedings of the ACM SIGPLAN 2004 Haskell Workshop, Snowbird, Utah, USA, pages 94–95 • Paper IV: Functional Morphology, Markus Forsberg & Aarne Ranta, Proceedings of the Ninth ACM SIGPLAN International Conference on Functional Programming, September 19-21, 2004, Snowbird, Utah, USA, pages 213–223
منابع مشابه
Thesis for the Degree of Licentiate of Philosophy
This thesis describes a number of practical experiments rather than theoretical investigations in the area of natural language processing. The basis for the work presented is Grammatical Framework (GF). It is a very complex system, which comprises among other things a grammar formalism based on type theory and its implementation written in Haskell. GF is intended for high-quality machine transl...
متن کاملModeling and Evaluation of Stochastic Discrete-Event Systems with RayLang Formalism
In recent years, formal methods have been used as an important tool for performance evaluation and verification of a wide range of systems. In the view points of engineers and practitioners, however, there are still some major difficulties in using formal methods. In this paper, we introduce a new formal modeling language to fill the gaps between object-oriented programming languages (OOPLs) us...
متن کاملModeling and Evaluation of Stochastic Discrete-Event Systems with RayLang Formalism
In recent years, formal methods have been used as an important tool for performance evaluation and verification of a wide range of systems. In the view points of engineers and practitioners, however, there are still some major difficulties in using formal methods. In this paper, we introduce a new formal modeling language to fill the gaps between object-oriented programming languages (OOPLs) us...
متن کاملA Novel Multiply-Accumulator Unit Bus Encoding Architecture for Image Processing Applications
In the CMOS circuit power dissipation is a major concern for VLSI functional units. With shrinking feature size, increased frequency and power dissipation on the data bus have become the most important factor compared to other parts of the functional units. One of the most important functional units in any processor is the Multiply-Accumulator unit (MAC). The current work focuses on the develop...
متن کاملA Formal Syntax of Natural Languages and the Deductive Grammar
Streszczenie. This paper presents a formal syntax framework of natural languages for computational linguistics. The abstract syntax of natural languages, particularly English, and their formal manipulations are described. On the basis of the abstract syntax, a universal language processing model and the deductive grammar of English are developed toward the formalization of Chomsky’s universal g...
متن کامل